Module 02 - Microservices with Python

The Monolith That Broke on a Tuesday

It starts innocently. Your document processing application handles uploads, runs OCR, classifies documents, sends notifications, and stores results - all in one Flask app. Deploys take 40 minutes. A bug in the notification code brings down OCR. A memory leak in the classification model crashes the upload handler. You cannot scale the CPU-heavy processing tier independently of the lightweight notification sender.

Then Tuesday comes. A spike in document uploads saturates the OCR workers. The entire application becomes unresponsive. Users cannot even check their upload history - that endpoint lives in the same overloaded process.

This module is about what you build instead, and more importantly, how and when to build it.

What You Will Learn

Lesson	Topic	Key Skills Gained
01	FastAPI in Depth	DI patterns, lifespan events, middleware, exception handlers, OpenAPI customisation
02	gRPC with Python	Protocol Buffers, all four streaming patterns, interceptors, error mapping
03	Event-Driven Architecture	Kafka, Redis Streams, Event Sourcing, CQRS, Saga pattern
04	Service Mesh Patterns	Circuit breakers, retry with jitter, bulkheads, OpenTelemetry tracing
05	API Versioning and Contracts	Pact contract testing, schema evolution, SDK generation, deprecation

Prerequisites: Python async fundamentals (Module 1 of this series), Docker basics, HTTP fundamentals.

Time commitment: ~12 hours of focused study, ~8 hours of hands-on project work.

The Migration: From Monolith to Four Services

The best way to understand microservice boundaries is to watch a real extraction. Here is DocumentProcessingMonolith - a class doing eight things that should never be owned by a single deployable unit.

# BEFORE: The Monolith - one class, eight responsibilities
# Every deployment touches every capability.
# Every bug can affect every user.
# You cannot scale OCR independently of email sending.

class DocumentProcessingMonolith:
    def __init__(self, db_conn, email_client, storage_client, ocr_engine, classifier):
        self.db = db_conn
        self.email = email_client
        self.storage = storage_client
        self.ocr = ocr_engine
        self.classifier = classifier

    def process_document(self, file_bytes: bytes, user_id: str, filename: str) -> dict:
        # Responsibility 1: Validate input
        if len(file_bytes) > 50 * 1024 * 1024:
            raise ValueError("File too large")
        if not filename.endswith((".pdf", ".png", ".jpg")):
            raise ValueError("Unsupported format")

        # Responsibility 2: Store raw file
        storage_key = f"raw/{user_id}/{filename}"
        self.storage.put(storage_key, file_bytes)

        # Responsibility 3: Run OCR (CPU-heavy, 2–30 seconds)
        text = self.ocr.extract_text(file_bytes)

        # Responsibility 4: Extract metadata
        metadata = self._extract_metadata(file_bytes)

        # Responsibility 5: Generate thumbnail (also CPU-heavy)
        thumbnail = self._generate_thumbnail(file_bytes)
        self.storage.put(f"thumbs/{user_id}/{filename}.jpg", thumbnail)

        # Responsibility 6: Classify document (ML inference)
        label = self.classifier.classify(text)

        # Responsibility 7: Write audit log
        self.db.execute(
            "INSERT INTO audit_log(user_id, filename, label, ts) VALUES (%s, %s, %s, %s)",
            (user_id, filename, label, datetime.utcnow()),
        )

        # Responsibility 8: Send notification email
        email = self.db.fetchone(
            "SELECT email FROM users WHERE id = %s", (user_id,)
        )["email"]
        self.email.send(
            to=email,
            subject="Your document is ready",
            body=f"Document '{filename}' classified as: {label}",
        )

        return {"storage_key": storage_key, "label": label, "text": text[:500]}

The problems are architectural, not stylistic. You cannot:

Scale OCR (4 CPUs minimum) without also scaling email sending (almost zero CPU).
Deploy a new ML classification model without touching the upload handler.
Upgrade notification templates without running the full OCR regression suite.
Have the ML team own the classifier independently of the infra team owning storage.
Isolate a memory leak in the classifier from affecting uploads.

AFTER: Four Services with Clear Boundaries

┌──────────────────────────────────────────────────────────────────────┐
│                       API Gateway / Nginx                             │
│             (TLS termination, auth, rate limiting, routing)           │
└──────────────────┬────────────────────────────────────┬──────────────┘
                   │ REST/JSON                           │ REST/JSON
                   ▼                                    ▼
    ┌──────────────────────────┐        ┌────────────────────────────┐
    │      Upload Service      │        │   Classification Service    │
    │   FastAPI · Python       │        │   FastAPI + gRPC · Python  │
    │                          │        │                            │
    │  • File validation       │        │  • ML model serving        │
    │  • Virus scanning        │        │  • Text → label mapping    │
    │  • S3/GCS storage        │        │  • Confidence scores       │
    │  • Publishes:            │        │  • Model version tracking  │
    │    doc.uploaded event    │        │  • gRPC for internal calls │
    │                          │        │                            │
    │  CPU: low   RAM: 256 MB  │        │  CPU: high   RAM: 2 GB     │
    │  Replicas: 2             │        │  Replicas: 4               │
    └──────────────┬───────────┘        └────────────────┬───────────┘
                   │                                      │
                   │     ┌────────────────────────┐      │
                   │     │     Message Broker      │      │
                   └────►│        Kafka            │◄─────┘
                         │                         │
                         │  Topics:                │
                         │  • doc.uploaded         │
                         │  • doc.processed        │
                         │  • doc.classified       │
                         └────────────┬────────────┘
                                      │
               ┌──────────────────────┴─────────────────────┐
               ▼                                             ▼
  ┌────────────────────────────┐         ┌─────────────────────────────┐
  │     Processing Service     │         │     Notification Service     │
  │   FastAPI · Python         │         │   FastAPI · Python           │
  │                            │         │                              │
  │  • OCR (pytesseract)       │         │  • Email (SendGrid)          │
  │  • PDF text extraction     │         │  • SMS (Twilio)              │
  │  • Thumbnail generation    │         │  • In-app push notifications │
  │  • Metadata extraction     │         │  • User preference lookup    │
  │  • Publishes:              │         │  • Template rendering        │
  │    doc.processed event     │         │                              │
  │                            │         │  CPU: low    RAM: 128 MB     │
  │  CPU: very high  RAM: 1 GB │         │  Replicas: 1                 │
  │  Replicas: 8               │         │                              │
  └────────────────────────────┘         └─────────────────────────────┘

Each service:

Owns its own data store - no shared database
Deploys independently on its own CI/CD pipeline
Scales based on its own resource bottleneck
Is owned by one team with full autonomy
Can be rewritten or replaced without affecting other services

This is the architecture you will build, piece by piece, across this module.

When to Use Microservices vs Monolith

This is the most consequential decision in this module. Most teams reach for microservices too early, creating distributed monoliths - all the operational complexity of distributed systems, with none of the independent deployment or scaling benefits.

The Decision Matrix

Factor	Choose Monolith	Choose Microservices
Team size	Under 8 engineers	Multiple teams, 15+ engineers
Deployment frequency	Infrequent, coordinated	Teams deploy independently, 20+ times/day
Scale requirements	Roughly uniform	Wildly different per component
Domain clarity	Still discovering the model	Well-understood bounded contexts
Operational maturity	No Kubernetes expertise	Strong DevOps culture, observability in place
Data ownership	Shared DB acceptable	Clear ownership, teams own their schema
Development speed	Ship an MVP fast	Independent team velocity matters

The rule: If your team cannot draw bounded contexts on a whiteboard without a 30-minute argument, you are not ready for microservices.

The Recommended Path: Modular Monolith First

Start with a modular monolith - well-separated modules inside one deployable, with strict interface boundaries. Extract to services when you have evidence of the need.

# A modular monolith: one deployment, clean internal interfaces
# When you later extract to a service, you only change the adapter

# upload/ports.py - defines what upload module needs from outside
from abc import ABC, abstractmethod

class StoragePort(ABC):
    @abstractmethod
    async def put(self, key: str, data: bytes) -> str: ...

class EventPort(ABC):
    @abstractmethod
    async def publish(self, topic: str, event: dict) -> None: ...

# upload/service.py - business logic, no infrastructure knowledge
class UploadService:
    def __init__(self, storage: StoragePort, events: EventPort):
        self._storage = storage
        self._events = events

    async def upload(self, file_bytes: bytes, filename: str, user_id: str) -> str:
        key = f"raw/{user_id}/{filename}"
        await self._storage.put(key, file_bytes)
        await self._events.publish("doc.uploaded", {
            "key": key, "user_id": user_id, "filename": filename
        })
        return key

# In a monolith: events are in-process function calls
# When you extract: events become Kafka messages
# The UploadService code does not change - only the EventPort adapter changes

The EventPort abstraction is the key. Today it calls a function in-process. Tomorrow it sends a Kafka message. The service logic is identical.

CAP Theorem: The Constraint Every Microservice Engineer Must Internalize

In a distributed system, you can guarantee at most two of these three properties simultaneously:

Property	Meaning	Real-World Definition
Consistency	Every read reflects the latest write	All nodes return the same data at the same moment
Availability	Every request receives a response	System responds even when some nodes fail
Partition Tolerance	System operates despite network failures	Continues working when the network splits nodes

Network partitions are inevitable in distributed systems. You will have network failures. So partition tolerance is not optional - the practical choice is always between C and A when a partition occurs:

CP systems (Consistency + Partition Tolerance): Refuse to answer during a partition rather than return stale data. Examples: HBase, Zookeeper, etcd, PostgreSQL in strict mode. Use for: distributed locks, financial transactions, leader election.
AP systems (Availability + Partition Tolerance): Return potentially stale data rather than refuse to respond. Examples: Cassandra, DynamoDB, CouchDB. Use for: shopping carts, user sessions, activity feeds, search indexes.

In the document intelligence platform:

Data	Choice	Reason
Classification label in read model	AP	Slight staleness is fine; user sees updated label within seconds
Billing record for processing charge	CP	Must be accurate; better to show an error than charge incorrectly
Audit log entries	AP with eventual consistency	Events arrive in order eventually; availability > immediate consistency
User session token	AP	Returning a slightly stale token is better than login failing

The 8 Fallacies of Distributed Computing

Peter Deutsch and James Gosling catalogued eight assumptions developers incorrectly make. Each one will cause production incidents if ignored.

#	Fallacy	The Truth	How to Defend Against It
1	The network is reliable	Packets are dropped, connections reset, routers fail	Retries with exponential backoff; idempotent operations
2	Latency is zero	Cross-datacenter calls: 5–100 ms; cross-pod: 0.5–2 ms	Async messaging; connection pooling; caching; batching
3	Bandwidth is infinite	Large payloads are expensive and slow	Pagination; compression (gzip/brotli); streaming; binary protocols
4	The network is secure	Traffic can be intercepted, replayed, spoofed	mTLS between services; service-to-service JWT auth; encryption at rest
5	Topology doesn't change	IPs change when pods restart; services autoscale in and out	DNS-based service discovery; health checks; graceful connection draining
6	There is one administrator	Multiple teams deploy conflicting changes simultaneously	API contracts; consumer-driven contract testing; feature flags
7	Transport cost is zero	Serialisation, TLS handshakes, HTTP overhead all accumulate	gRPC (binary, multiplexed) for high-frequency internal calls
8	The network is homogeneous	Different languages, OS, protocol versions, MTU sizes	Standard protocols (HTTP/2, gRPC); schema registries; protocol buffers

By the end of this module, you will have written Python code that defends against all eight.

How Python Fits Into Polyglot Microservice Architectures

Python is rarely the only language in a mature microservice shop. Understanding how Python services interoperate with Go, Java, and Rust services is essential.

                    Polyglot Production Architecture
                    ─────────────────────────────────

  ┌──────────────────┐   gRPC (proto)   ┌──────────────────────────┐
  │   API Gateway    │─────────────────►│   Auth Service (Go)       │
  │  Python / FastAPI│                  │   ~1 ms latency, 64 MB RAM│
  └────────┬─────────┘                  └──────────────────────────┘
           │
           │  REST / JSON
           ▼
  ┌──────────────────┐   Kafka events   ┌──────────────────────────┐
  │ Document Service │─────────────────►│  ML Pipeline (Python)     │
  │ Python / FastAPI │                  │  PyTorch inference server  │
  └──────────────────┘                  └──────────────────────────┘
           │
           │  REST / JSON
           ▼
  ┌──────────────────┐   gRPC (proto)   ┌──────────────────────────┐
  │  Search Service  │─────────────────►│  Index Builder (Java)     │
  │ Python / FastAPI │                  │  Lucene, needs heavy JVM  │
  └──────────────────┘                  └──────────────────────────┘

Python wins in microservice architectures for:

ML and data pipelines (PyTorch, scikit-learn, pandas have no peer)
API gateway services (FastAPI with uvicorn rivals Go throughput for I/O-bound work)
Scripting and orchestration (calling and coordinating other services)
Rapid prototyping (fastest path from idea to deployed service)

Python loses to Go/Rust when:

CPU-intensive hot paths need true parallelism (the GIL is a real constraint)
Memory is severely constrained (Python interpreter overhead is ~30–50 MB baseline)
Connection counts exceed ~10,000 concurrent (Go goroutines outperform asyncio at this scale)

The practical pattern: Python for business logic and ML inference; Go or Rust for the network-critical, CPU-intensive hot paths; all connected via gRPC and Kafka.

Module Project: The Document Intelligence Platform

Each lesson constructs one piece of a complete, deployable Document Intelligence Platform.

Lesson	Service Built	Technology Highlighted
01 - FastAPI in Depth	Upload Service	DI, lifespan, middleware, background tasks
02 - gRPC with Python	Classification Service	Proto definition, streaming RPC, interceptors
03 - Event-Driven Architecture	Event backbone	Kafka topics, event sourcing, saga pattern
04 - Service Mesh Patterns	Resilience layer	Circuit breakers, tracing, health checks
05 - API Versioning and Contracts	Versioned API	Pact tests, schema evolution, client SDK

The code in each lesson is production-quality. It handles errors, logs correctly, and is structured for testability. You can deploy it.

Environment Setup

# Project structure
mkdir -p doc-intelligence/{upload-service,classification-service,processing-service,notification-service,shared,protos}
cd doc-intelligence

# Python environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Core service dependencies
pip install fastapi "uvicorn[standard]" httpx pydantic "pydantic-settings"
pip install grpcio grpcio-tools protobuf
pip install kafka-python confluent-kafka redis
pip install opentelemetry-api opentelemetry-sdk
pip install opentelemetry-instrumentation-fastapi
pip install opentelemetry-exporter-otlp
pip install tenacity pact-python

# Infrastructure via Docker Compose
cat > docker-compose.yml << 'YAML'
version: "3.9"

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.6.0
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181

  kafka:
    image: confluentinc/cp-kafka:7.6.0
    depends_on: [zookeeper]
    ports:
      - "9092:9092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true"

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: documents
      POSTGRES_USER: admin
      POSTGRES_PASSWORD: secret
    ports:
      - "5432:5432"

  jaeger:
    image: jaegertracing/all-in-one:1.55
    ports:
      - "16686:16686"   # Jaeger UI
      - "4317:4317"     # OTLP gRPC receiver
      - "4318:4318"     # OTLP HTTP receiver
YAML

docker compose up -d
echo "Infrastructure ready."

Navigating This Module

Each lesson is self-contained - you can study gRPC without having read the FastAPI lesson. But the project connects them all. Recommended approach:

Read the lesson once, end to end, skimming code to understand structure
Reproduce every code example from scratch (not copy-paste) - this is where understanding forms
Build the mini-project at the end of each lesson
Integrate your service with the one built in the previous lesson

The fastest path to mastering distributed systems is building a small one and watching it fail in interesting ways.

Let's begin.

The Monolith That Broke on a Tuesday​

What You Will Learn​

The Migration: From Monolith to Four Services​

AFTER: Four Services with Clear Boundaries​

When to Use Microservices vs Monolith​

The Decision Matrix​

The Recommended Path: Modular Monolith First​

CAP Theorem: The Constraint Every Microservice Engineer Must Internalize​

The 8 Fallacies of Distributed Computing​

How Python Fits Into Polyglot Microservice Architectures​

Module Project: The Document Intelligence Platform​

Environment Setup​

Navigating This Module​